Notation-based Semantification
نویسندگان
چکیده
The scientific community produces a large number of mathematical papers (approximately 108.000 new papers per year [arX]), which raises the importance of machine based processing of such documents. Unfortunately, the most popular formats in which these papers are found (for instance, LTEX) do not contain much information that would allow the computers to infer the human-understandable knowledge contained within a paper. Since, at this point, changing these formats is not practically possible, the other solution is to add a semantic flavor to the existing documents by translating them into a more suitable format, for instance, Content MathML. The current scientific community publication went through a series of evolutions in search of the best method of writing scientific documents. This process was highly influenced by the invention and spreading of the internet. Scientists understood the necessity of a standard that could help them write and exchange their findings in an efficient way. A lot of effort has gone into translating books into digital documents. The next step in this evolution is translating digital documents into knowledge-rich digital documents. This next step can only happen if a new feasible way of transition appears, which MathSemantifier attempts to become. Ambiguity is one of the main reasons that makes semantification complex. Mathematical documents are not a simple collection of symbols. The main use of these documents emerges only when the intended semantics of a document is accessible. However, humans tend to be lazy in writing down the whole graph, but instead rely on implicit human knowledge to decipher these documents. This is where ambiguity comes into play, when the author relies on the ability of the human to use the context of document in order to pinpoint the actual meaning an expression. Ambiguities can be largely divided into two: structural and idiomatic ambiguities. A simple example that demonstrates the concept of structural ambiguities can be “sin x / 2”. It can mean one of the following:
منابع مشابه
CAPLAN: An Accessible, Flexible and Scalable Semantification Architecture
The popularity of semantic information systems requires more data to be semantically prepared. However, the subsequent semantification process is still reserved for experts in Natural Language Processing. In this paper we define requirements for a state-of-the-art semantification architecture. Additionally we present a concept for a new semantification architecture meeting these requirements. K...
متن کاملThe Revieval of Subject Analysis: A Knowledge-based Approach facilitating Semantic Search
Semantic Search emerged as the new system paradigm in enterprise information systems. However, usually only small amounts of textual enterprise data is semantically prepared for such systems. The manual semantification of these resources typically is a time-consuming process. The automatic semantification requires deep knowledge in Natural Language Processing. Therefore, in this paper we presen...
متن کاملComparison of the learning of two notations: A pilot study
Introduction: MICAP is a new notation in which the teeth areindicated by letters (I-incisor, C-canine, P-premolar, M-molar)and numbers [1,2,3] which are written superscript and subscripton the relevant letters. FDI tooth notation is a two digit systemwhere one digit shows quadrant and the second one shows thetooth of the quadrant. This study aimed to compare the short termretention of knowledge...
متن کاملDiscovering similar Twitter accounts using semantics
On daily basis, millions of Twitter accounts post a vast number of tweets including numerous Twitter entities (mentions, replies, hashtags, photos, URLs). Many of these entities are used in common by many accounts. The more common entities are found in the messages of two different accounts, the more similar, in terms of content or interest, they tend to be. Towards this direction, we introduce...
متن کاملOn the Semantification of 5-Star Technical Documentation
Technical documentation is a special purpose content describing machines and plants with high complexity. The documentation covers operation, maintenance and repair of the technical artifacts. The high complexity of the machines yields a voluminous documentation, where it increasingly becomes difficult to find the relevant information for a given problem. The paper discusses the use of semantic...
متن کامل